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ABSTRACT 

This  paper  describes  issues  in  adapting  the  PUNDIT  system,  designed  originally  for  message  processing, 
to  a  query-answering  system  for  the  voyager  application.  The  resulting  system,  whose  architecture  and 
capabilities  are  described  here,  represents  a  first  step  towards  our  goal  of  demonstrating  spoken  language 
understanding  in  an  interactive  problem-solving  context. 

INTRODUCTION 

The  application  which  we  describe  in  this  paper  is  a  natural  language  interface  to  VOYAGER,  an  expert 
system  which  provides  navigational  assistance  for  the  area  between  MIT  and  Harvard^.  The  application 
is  well-suited  to  exploring  the  role  of  natural  language  understanding  in  an  interactive  problem-solving 
context:  the  expert  system  knows  enough  about  its  domain  to  allow  extended  problem-solving  dialogue, 
e.g.  for  a  user  to  find  a  hotel  near  MIT,  find  a  nearby  Italian  restaurant,  and  get  directions  from  the  hotel 
to  the  restaurant  (see  below). 

Our  ultimate  goal  is  to  process  spoken  input,  and  indeed,  dialogue  is,  to  use  the  terms  of  [BibSSj,  a 
stereotypically  oral  genre.  A  reasonable  strategy  for  incrementally  arriving  at  this  goal  is  to  begin  with 
typed  input.  This  approach  allows  us  to  explore  general  issues  in  handling  dialogue,  and  further  allows 
us  to  look  at  ways  in  which  characteristics  of  the  genre,  aspects  of  the  task,  the  informational  structure 
of  the  utterance  and  the  discourse  context  can  all  be  used  to  limit  the  search  space  for  spoken  language 
recognition. 

The  following  dialogue  illustrates  a  typical  problem-solving  session  with  VOYAGER  (note:  utterances  la¬ 
belled  ‘VFE’  are  contributed  by  the  voyager  front-end,  discussed  shortly). 


User:  Is  there  a  hotel  near  MIT? 

VFE  :  I’ll  ask  Voyager... 

Vgr  :  I  can’t  determine  if  objects  are  near  something. 

I’ll  try  to  find  the  nearest  one. 

The  nearest  hotel  to  HIT  is  Marriott. 

Marriott  is  located  at  the  intersection  of  Broadway  Street  and  Third 
Street . 


User:  Is  there  an  Italian  restaurcmt  near  the  hotel? 

VFE  :  I’ll  ask  Voyager... 

Vgr  :  I  can’t  determine  if  objects  are  near  something. 

I’ll  try  to  find  the  nearest  one. 

The  nearest  Italian  RESTAURANT  to  Marriott  is  Bertucci’s. 

•This  work  has  been  supported  by  DARPA  contract  N000I4-89-C-0171,  administered  by  the  Office  of  Naval  Research. 
‘The  VOYAGER  system  has  been  made  available  to  us  by  Victor  Zue  of  MIT,  to  whom  we  acknowledge  our  gratitude.  The 
version  of  VOYAGER  described  in  this  paper  is  the  June  89  version.  The  capabilities  of  VOYAGER  have  since  been  significantly 
extended,  and  all  discussion  of  VOYAGER  in  this  paper  applies  only  to  this  particular  version  of  the  system. 
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Bertucci’s  Is  located  at  799  Main  Street. 


User:  Hon  do  I  get  to  the  restaurant  from  the  Harriott? 

VFE  :  I’ll  ask  Voyager... 

Vgr:  I’ll  try  to  find  the  best  nay  to  get  from  Marriott  to  Bertucci’s. 

Start  from  Marriott  on  Broadnay  Street. 

After  you  cross  Clark  Street ,  take  the  next  left  onto  Windsor  Street . 

There  is  a  traffic  light  there. 

After  you  pass  School  Street  on  the  right,  taka  the  next  right  onto 
Main  Street.  Thera  is  a  traffic  light  there. 

Bertucci’s  is  about  forty  yards  donn  on  your  right  side. 

User:  Hon  long  nould  that  take? 

(etc.) 

MESSAGE  PROCESSING  AND  QUESTION-ANSWERING 

The  PUNDIT  natural  language  processing  system  was  initially  developed  in  the  context  of  message-processing 
applications.  Because  PUNDIT  is  a  modular  system,  typical  porting  tasks  include  the  creation  of  a  domain- 
specihc  lexicon,  knowledge  base,  and  semantics  rules.  Another  common  feature  is  basic  extensions  to 
handle  constructions  which  are  part  of  the  standard  written  language,  but  which  have  not  previously 
appeared  in  the  data  (e.g.  comparatives,  superlatives,  address  expressions).  More  interesting  ate  the 
extensions  occasioned  by  basic  differences  between  messages  and  face-to-(inter)face  conversational  inter¬ 
action. 

Basic  Differences  Between  Messages  and  Voyager  Dialogue 

To  fully  characterize  these  differences  requires  representative  data  from  naturally-occurring  messages  and 
task-oriented  dialogue.  We  have  the  former;  it  is  our  plan  to  collect  the  latter,  possibly  using  PUNDIT  -f- 
VOYAGER  as  a  data  collection  vehicle.  For  present  purposes,  we  may  highlight  some  of  the  observed  dif¬ 
ferences  and  similarities.  Our  message  corpora  consist  largely  of  short  narratives  in  what  has  been  called 
telegraphic  style  (cf.  [GK86]).  As  a  consequence,  we  find  frequent  agentless  passives,  noun-noun  com¬ 
pounds,  nominalizations,  run-on  sentences,  and  zeroing  of  determiners,  subjects,  copula,  and  prepositions. 
Explicit  pronouns  and  dummy  subjects  are  rare,  as  are  interrogatives,  indirect  questions,  imperatives,  and 
conditionals.  Where  temporal  relations  are  explicitly  marked,  they  are  commonly  marked  by  preposed 
time  adverbieils.  Modals  are  rare. 

In  contrast,  our  initial  corpus  of  user  inputs  to  VOYAGER  shows,  not  surprisingly,  that  interrogatives  and 
imperatives  are  more  frequent  than  statements.  The  types  of  zeroing  seen  in  telegraphic  narrative  do  not 
occur  in  the  corpus^,  nor  do  nominalizations  and  run-ons.  Peissives  are  rare.  Dummy  it  and  there  are 
common,  as  are  /and  you,  and  (in  the  context  of  questions  about  restaurants)  they.  Preposing  (other  than 
wh- movement)  does  not  occur.  Modals  are  common.  Many  of  these  differences  are  predicted  by  considering 
the  two  genres  to  be  at  opposite  ends  along  Bibet’s  textual  dimensions  of  ‘involved  versus  informational 
production’,  ‘narrative  vs.  non-narrative  concerns’,  and  ‘abstract  vs.  non-abstract  information’  [Bib88]. 
We  plan  to  take  advantage  of  such  differences  to  tune  the  system  for  the  voyager  task. 

In  terms  of  discourse  structure,  we  might  expect  there  to  be  radical  differences  between  messages  and  dia¬ 
logue,  but  in  fact  here  there  are  interesting  similarities.  The  basic  interactional  unit  in  VOYAGER  dialogue 
can  be  characterized  as  the  request-response  pair  (e.g.  Where  are  you?  At  MIT.).  A  similar  request- 
response  structure  appears  in  messages  containing  labelled  discourse  segments  (or  header  fields)  ,  e.g. 
Failed  Part:  system  tape.  Cause  of  Failure:  tape  was  wound  backwards.  Here,  the  message 
originator  can  be  viewed  as  answering  the  implicit  questions  What  part  failed?  What  caused  the  failure? 

^Nevertheless,  elliptical  questions  and  answers  are  certainly  seen  in  task-oriented  dialogue,  e.g.  North:  [It  meant  to]  ... 
basically  clear  up  the  record.  Niclds:  Did  you?  North:  TVicd  to.  Nields:  Then?  North:  No  ...  [ScrS*^ 
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[Bal89]  discusses  our  approach  to  handling  such  structured  messages  as  a  series  of  question/answer  pairs, 
and  we  were  able  to  extend  this  approach  to  dialogue. 

The  interactional  structure  of  monologue  and  dialogue  is,  of  course,  quite  different.  To  provide  the  control 
structure  for  dialogue,  we  extended  a  query  front-end  (qpb)  which  had  been  developed  for  database  query 
applications.  The  result  is  a  general  dialogue  manager  which  can  be  used  for  a  variety  of  interactive 
applications. 


SYSTEM  ARCHITECTURE 

The  system  contains  four  major  components:  the  VOYAGER  front-end  (vpe),  PUNDIT,  a  query  translation 
and  interface  module  (qtip),  and  the  voyager  expert  system.  The  first  three  components  are  currently 
running  on  Sun  workstations  under  Quintus  Prolog,  and  voyager,  which  is  written  in  Lisp,  runs  on  a 
Symbolics  machine.  A  simplified  system  flow  diagram  is  shown  in  Figure  1. 


Query /Response 


USER 


< 


I 

I 

._V _  user  input 

VFE  I  +  Context 

I  - > 

I  NewContext 

I  I 

I  I  New  Info 
I  I  -f  Context 


I  Response/ 
I  errors 

I  I 

I  I  Vgr  resp. 

I  I  +  Context 


I  NewContext 

I  < - 


V 

Response/Query  I 


I  PUNDIT  I 
I _ I 


I  I  function  calls 

I  QTIP  I - ... - > 

I  I  response 

I _ I  < - ... - 


I  PUNDIT  I 
I  I 

I _ I 


V 
0 

Y 
A 
G 
E 
R 


Figure  1:  PUNDIT  +  VOYAGER  System  Flow 

VPE  is  a  dialogue  manager,  which  uses  pundit  and  qtip  as  resources  to  interpret  and  respond  to  the 
user’s  requests.  As  discussed  below,  voyager  is  £ilso  a  conversational  participant,  whose  utterances  must 
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be  analyzed  and  integrated  into  the  discourse  context,  vfe  administers  the  turn-taking  structure,  and 
maintains  a  higher-level  model  of  the  discourse  than  that  available  to  PUNDIT.  This  level  of  knowledge 
enables  it,  for  example,  to  call  the  parser  in  different  modes,  depending  on  preceding  discourse  (see  below). 
VFE  also  keeps  track  of  the  current  speaker  and  hearer,  so  that  pundit’s  Reference  Resolution  component 
can  correctly  interpret  I  and  you. 

PUNDIT,  as  described  in  ([HPD+89],  [PDF+SG],  [Dah86]),  provides  syntactic,  semantic,  and  pragmatic 
interpretation.  The  input  to  pundit  is  currently  text,  and  the  output  is  a  set  of  semantic  representations 
and  other  predications  representing  the  discourse  context  (the  DISCOURSE  list),  and  a  list  of  entities  in 
focus,  ordered  by  saliency  (the  FOCUS  LIST). 

QTiP’s  function  is  to  translate  PUNDIT  representations  into  LISP  function  calls,  to  pass  these  to  VOYAGER, 
and  to  return  voyager’s  response  to  VFE.  qtip  also  incorporates  some  knowledge  about  voyager’s 
capabilities  which  enables  it  to  trap  certain  types  of  queries  for  appropriate  action  by  VFE.  For  example, 
VOYAGER  cannot  answer  direction  requests  with  an  unspecified  starting  point,  unless  it  knows  where  the 
user  is.  In  this  case,  QTIP  informs  VFE  that  it  must  elicit  the  user’s  location.  As  another  example,  VOYAGER 
cannot  answer  questions  about  whether  a  class  of  objects  is  located  on  a  street.  QTIP  traps  such  questions, 
and  VFE  informs  the  user: 

User:  Is  there  a  subway  station  on  Church  Street? 

VFE  :  Sorry,  Voyager  C2m*t  determine  whether  something  is  on  a  street. 


QTIP  also  monitors  the  state  of  the  machine-machine  interface  to  VOYAGER,  and  notifies  vfe  when  the 
link  is  down  or  VOYAGER  is  not  loaded;  VFE  then  notifies  the  user. 

The  final  component  is  the  VOYAGER  expert  system,  a  version  of  which  has  been  made  available  to  us  by 
MIT.  VOYAGER  includes  a  generation  component,  and  one  of  our  initial  issues  was  how  to  deal  with  its 
output.  Clearly,  it  would  be  a  poor  sort  of  interactive  system  that  did  not  allow  for  ordinary  anaphoric 
and  definite  reference  to  entities  introduced  in  the  course  of  the  conversation,  e.g.  to  Royal  East  in  the 
example  below: 

i:  Is  there  a  restaurant  near  here? 

B:  ...  The  nearest  restaureuit  to  MIT  is  Royal  Hast  ... 

A:  What  kind  of  food  does  it  serve? 

However,  in  order  to  integrate  voyager’s  conversationeil  contribution  into  the  discourse  context,  we  were 
faced  with  the  choice  of  (a)  modifying  voyager  to  return  some  semantic  representation  of  its  utterances 
(together  with  information  relevant  to  focusing),  or  (b)  simply  treating  voyager  as  an  ordinary  con¬ 
versational  participant,  and  using  PUNDIT  to  analyze  what  voyager  said.  The  latter  proved  the  most 
expedient  course  of  action.  Though  not  the  most  efficient  approach  in  the  long  run,  it  has  the  unexpected 
benefit  of  allowing  us  to  look  at  issues  on  both  sides  of  the  dialogue. 

MAINTAINING  THE  DISCOURSE  CONTEXT 


Interactive  systems  must  provide  for  the  strategies  by  which  speakers  refer  to  what  has  been  previously 
discussed,  e.g.  one- anaphora: 

Ql.  Is  there  a  restaurant  that  serves  seafood  near  MIT?  ...  Q2.  Is  there  one  near  Harvard? 

A  system  which  does  not  handle  anaphoric  and  definite  references  requires  unnatural  behaviour  on  the 
part  of  the  user,  who  will  be  forced  to  refer  to  all  entities  as  though  they  were  brand-new  in  the  discourse, 
e.g.  Q2’: 

Q2’.  Is  there  a  restaurant  that  serves  seafood  near  Harvard? 
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But  it  is  clearly  not  sufficient  to  allow  reference  only  to  what  the  user  has  said,  for  the  system’s  response 
forms  part  of  the  user’s  discourse  context,  as  illustrated  above  and  below. 

These  points,  of  course,  are  not  specific  to  the  application  or  domain,  but  adhere  to  language  and  dia¬ 
logue.  For  this  reason,  the  discourse  processing  facilities  of  pundit  are  largely  embodied  in  the  domain- 
independent  Reference  Resolution  module.  This  module,  which  required  no  significant  extensions  for  the 
VOYAGER  application,  is  capable  of  handling  a  variety  of  referential  strategies.  In  the  following  annotated 
dialogue,  we  illustrate  some  of  the  capabilities  of  Reference  Resolution  and  the  importance  of  understand¬ 
ing  both  sides  of  the  dialogue. 

1  User:  Is  there  a  restaurant  near  Harvard  ? 

2  7FE  :  I’ll  ask  Voyager... 

3  Vgr  :  I  C2ui’t  determine  if  objects  are  near  something. 

I’ll  try  to  find  the  nearest  one. 

The  neeirest  restaurant  to  Harvard  University  is  Border  Cafe. 

Border  Cafe  is  located  at  32  Church  Street. 


Comments:  voyager’s  response  is  processed  by  PUNDIT  and  integrated  into  the  discourse 
context.  As  a  result,  the  Border  Cafe  enters  the  list  of  entities  in  focus,  and  is  thus  available 
for  anaphoric  reference  in  (4)  below. 

4  User:  Vhat  kind  of  food  does  it  serve  ? 

5  7FE  :  I’ll  ask  Voyager... 

6  Vgr  :  Border  Cafe  serves  Mexican  food. 


Comments:  it  -  the  user  is  referring  to  the  Border  Cafe,  an  entity  introduced  by  voyager  in 
turn  (3).  Reference  Resolution  quickly  finds  this  entity,  which  is  at  the  head  of  the  Focus  List. 


7  User: 

8  VFE  : 

9  User: 

10  VFE  : 

11  Vgr  : 


Hos  do  I  get  there? 

I’ll  ask  Voyager... 

Where  axe  you? 

It  MIT. 

Ok.  I  will  tell  Voyager... 

MIT  is  located  in  the  area  of  77  Massachusetts  Avenue. 

I’ll  try  to  find  the  best  way  to  get  from  HIT  to  Border  Cafe. 

If  your  car  is  on  the  same  aide  of  the  street  as  MIT,  start  driving. 
Take  the  next  left  onto  Massachusetts  Avenue.  There  is  a  traffic  light 
there . 

After  you  cross  Garden  Street ,  take  the  next  right  onto  Church  Street . 
Border  Cafe  is  about  one  eighth  mile  down  on  your  left  side. 


Comments:  there  in  (7)  is  interpreted  as  an  anaphoric  reference  to  a  salient  location.  Note  the 
clarification  dialogue  (voyager  will  need  to  know  the  starting  point),  where  VFE  becomes  the 
queiier.  Since  there  is  a  salient  wh-query  in  the  context  (Where  ore  you?),  VFE  anticipates  a 
short  response®  and  calls  the  parser  and  semantics  in  a  special  mode.  The  user’s  short  response 
is  analyzed  and  bound  to  the  variable  in  the  query  {where),  and  VFE  creates  a  new  proposition: 

The  user  is  at  MIT.  This  information,  together  with  the  translation  of  the  original  query  {How 
do  I  get  there?)  is  passed  to  VOYAGER. 

®For  purposes  of  the  present  discussion,  a  short  response  is  one  which  is  not  a  complete  sentence,  e.g.  What  it  your  name  f 
Mary  Smith  vs.  My  name  it  Mary  Smith,  See  [Hiz79]  for  a  discussion  of  short  responses. 
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12  User:  What’s  the  phone  number? 

13  VFE  :  I’ll  ask  Voyager... 

14  Vgr  :  The  telephone  number  of  Border  Cafe  is  864-6100. 


Comments;  tfte  p/ione  numfcer  is  a  definite  reference  to  an  »n/erro6Ze  entity  (cf.  [PriSl]).  There 
is  no  previously  mentioned  telephone  number,  and  yet  it  is  a  stereotypic  assumption  that  certain 
classes  of  objects,  e.g.  commercial  establishments,  have  phone  numbers.  This  information  is 
encoded  in  our  knowledge  base.  Reference  Resolution  looks  for  previously-mentioned  entities 
that  have  the  property  of  having  phone  numbers,  and  finds  the  Border  Cafe. 

15  User:  Is  there  a  subway  stop  near  the  restaur£uit? 

16  VFE  :  I’ll  ask  Voyager... 

17  Vgr  :  I  can’t  determine  if  objects  ere  neer  something. 

I’ll  try  to  find  the  nearest  one. 

The  neerest  subway  stop  to  Border  Cafe  is  Eervard  Station. 

Harverd  Station  is  located  at  the  intersection  of  Massachusetts 

Avenue  and  Church  Street . 

Comments:  the  restaurant  is  a  definite  reference  to  the  Border  Cafe.  Note  that  it  would  not 
be  correct  to  look  for  the  last  explicit  mention  of  a  restaurant,  for  this  algorithm  would  find 
the  restaurant  introduced  in  turn  1:  Is  there  a  restaurant  near  Harvard?  Instead,  Reference 
Resolution  looks  for  the  salient  entity  of  type  restaurant,  and  finds  the  Border  Cafe. 

CURRENT  COVERAGE  AND  ISSUES 

Current  coverage  on  our  training  corpus  of  174  voyager  inputs  is  57%;  that  is,  100  of  the  utterances  ate 
correctly  processed  by  vfe,  pundit  and  qtip,  and  ate  sent  to  voyager.  We  plan  to  increase  coverage 
to  90%,  and  will  then  test  the  system  on  a  mote  extensive  corpus  of  spontaneous  queries,  collected  by 
MIT.  In  addition,  we  plan  to  collect  and  analyze  samples  of  task-oriented  dialogue,  in  order  to  evaluate 
the  system  performance  in  providing  a  co-operative  interface  for  interactive  problem-solving. 

One  of  the  issues  which  we  anticipate  addressing  is  the  interpretation  of  additional  types  of  questions,  e.g. 
short  questions.  For  example,  the  notions  of  focus  and  open  proposition  (cf.  [Pri86])  seem  crucial  to  the 
interpretation  of  sequences  such  as  .Are  there  any  bars  around  MIT?  ...  How  about  clubs?  ...  Restaurants?. 

We  also  intend  to  further  explore  the  relationship  between  the  information  structure  of  the  question  and 
the  answer,  in  order  to  optimize  the  processing  of  the  answer.  For  example,  when  VFE  asks  a  clarification 
question  such  as  Where  are  you?,  we  construct  an  open  proposition  whose  variable  is  typed  as  a  location^. 
We  can  therefore  anticipate  a  cohesive  response  from  the  user  in  which  the  open  proposition  is  satisfied 
by  a  location.  A  short  response  is  maximally  cohesive  (M7T  or  At  MIT),  and  we  currently  anticipate  the 
possibility  of  this  type  of  response,  and  call  the  parser  first  for  an  NP  fragment,  and  then  (if  that  fails)  for 
a  PP  fragment.  We  could  go  farther  by  expecting  certain  types  of  prepositional  phrases,  namely  location 
expressions  (e.g.  By  subway  would  be  an  incoherent  response).  For  the  long  response  (7  am  at  MIT), 
we  may  expect  the  focus  to  be  accented,  and  the  material  corresponding  to  the  open  proposition  to  be 
de-accented.  Such  expectations  should  prove  useful  for  speech  recognition. 

These  are  not  only  important  research  issues,  but  we  believe  them  to  have  a  direct  bearing  on  the  ultimate 
performance  of  the  system,  in  terms  of  user  satisfaction  and  time  to  accurately  complete  the  task. 

^Intiiitivelyy  an  open  proposition  is  an  expression  containing  an  unbound  variable. 
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